Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2][query] Create archive reader/writer using regular factory methods #6519

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

mahadzaryab1
Copy link
Collaborator

@mahadzaryab1 mahadzaryab1 commented Jan 10, 2025

Which problem is this PR solving?

Description of the changes

  • This PR changes the jaegerquery extension to remove usages of CreateArchiveSpanReader and CreateArchiveSpanWriter and replace them with their primary storage counterparts from the v2 storage API CreateTraceReader and CreateTraceWriter.
  • 🛑 This PR contains a breaking changes for users of jaeger-v2 that have an ElasticSearch storage configured for traces_archive 🛑

How was this change tested?

Checklist

@mahadzaryab1
Copy link
Collaborator Author

mahadzaryab1 commented Jan 10, 2025

@yurishkuro should this be marked as a breaking change? the es storage wont use the -archive alias anymore for jaeger-v2

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.25%. Comparing base (7189b52) to head (b557b31).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6519      +/-   ##
==========================================
+ Coverage   96.21%   96.25%   +0.04%     
==========================================
  Files         373      373              
  Lines       21378    21401      +23     
==========================================
+ Hits        20569    20600      +31     
+ Misses        616      610       -6     
+ Partials      193      191       -2     
Flag Coverage Δ
badger_v1 10.64% <ø> (ø)
badger_v2 2.77% <ø> (ø)
cassandra-4.x-v1-manual 16.54% <ø> (ø)
cassandra-4.x-v2-auto 2.71% <ø> (ø)
cassandra-4.x-v2-manual 2.71% <ø> (ø)
cassandra-5.x-v1-manual 16.54% <ø> (ø)
cassandra-5.x-v2-auto 2.71% <ø> (ø)
cassandra-5.x-v2-manual 2.71% <ø> (ø)
elasticsearch-6.x-v1 20.32% <ø> (+<0.01%) ⬆️
elasticsearch-7.x-v1 20.40% <ø> (+<0.01%) ⬆️
elasticsearch-8.x-v1 20.56% <ø> (+<0.01%) ⬆️
elasticsearch-8.x-v2 2.77% <ø> (-0.01%) ⬇️
grpc_v1 12.15% <ø> (-0.01%) ⬇️
grpc_v2 9.03% <ø> (+<0.01%) ⬆️
kafka-3.x-v1 10.33% <ø> (ø)
kafka-3.x-v2 2.77% <ø> (ø)
memory_v2 2.77% <ø> (ø)
opensearch-1.x-v1 20.44% <ø> (ø)
opensearch-2.x-v1 20.44% <ø> (+<0.01%) ⬆️
opensearch-2.x-v2 2.77% <ø> (ø)
tailsampling-processor 0.51% <ø> (ø)
unittests 95.14% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Mahad Zaryab <[email protected]>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a breaking change implied here, in the sense that if the users do not set the config options appropriately the new storage objects may not behave the same way as archive objects did previously. I recommend handling it with a feature gate from OTEL. We can declare the feature with Beta state (i.e. enabled right away, mark as breaking change), but the users have the option to turn it off manually and fall back onto the old behavior that we should preserve for now. Then in the next release we set the feature to Stable where turning it off will give a runtime error (could also be labeled a breaking change) so the new behavior is the only one possible. And then in the following release we remove the feature altogether along with the legacy code.

example in cmd/jaeger/internal/extension/remotesampling/config.go

yurishkuro
yurishkuro previously approved these changes Jan 15, 2025
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yurishkuro yurishkuro changed the title [WIP][v2][query] Change jaegerquery extension to use primary reader/writer for archive storage [WIP][v2][query] Create archive reader/writer using regular factory methods Jan 15, 2025
@yurishkuro
Copy link
Member

I have a feeling we will still need to address the question of removing create-archive methods later because as we start LFX project and being implementing v2 API in the storage directly, we will run into issue with Elasticsearch implementation.

@yurishkuro yurishkuro dismissed their stale review January 15, 2025 02:47

We need to implement a feature gate

@yurishkuro
Copy link
Member

I think this is breaking backwards compatibility, is it not? I.e. to get the equivalent behavior as the previous version the user needs to set some parameters (which ones, what combination?)

@mahadzaryab1
Copy link
Collaborator Author

mahadzaryab1 commented Jan 16, 2025

Test Report

1. Establish Ground Truth on main

To begin, configure the storage and query settings in the all-in-one.yaml file as follows:

jaeger_storage:
  backends:
    some_storage:
      elasticsearch:
        indices:
          index_prefix: "jaeger-main"
          spans:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          services:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          dependencies:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          sampling:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
    another_storage:
      elasticsearch:
        indices:
          index_prefix: "jaeger-archive"

2. Spin Up Elasticsearch Using Docker

To bring up Elasticsearch, run the following commands:

jaeger % cd docker-compose/elasticsearch/v8 
v8 % docker compose up

3. Start Jaeger

Start the Jaeger service:

jaeger % go run ./cmd/jaeger

4. Archive a Trace

From the Jaeger UI, select a trace and archive it.

traceID: 0dc3e460bd9b8e0dddfa29a2f751cfb9

Trace Screenshot

5. Update index_prefix

Stop Jaeger and modify the index_prefix in the primary configuration to jaeger-main-1. This ensures the query for the same trace is no longer found in the primary storage but will be found in the archive storage.

6. Query for the Same Trace

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}'
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "/api/services"
}
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "GetService"
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}'
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "GetService"
}
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "/api/services"
}

7. Test Changes from This PR

Stop Jaeger, checkout this PR, and restart Jaeger:

gh pr checkout 6519
go run ./cmd/jaeger

8. Query for the Same Trace After PR Changes

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq .
{
  "data": null,
  "total": 0,
  "limit": 0,
  "offset": 0,
  "errors": [
    {
      "code": 404,
      "msg": "trace not found"
    }
  ]
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq .
{
  "error": {
    "httpCode": 404,
    "message": "No traces found"
  }
}

🛑 This is where the breaking change occurs 🛑

9. Mitigation for Users

Set use_aliases for Archive Storage

To mitigate the issue, set the use_aliases configuration for your archive storage to true. Update the configuration as follows:

another_storage:
  elasticsearch:
    indices:
      index_prefix: "jaeger-archive"
    use_aliases: true

Add Alias from Old Index to New Index

To ensure backwards compatibility, add an alias from the old index to the new index. You can query the current set of aliases in Elasticsearch with:

curl -X GET "http://localhost:9200/_aliases?pretty"
{
  "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-archive-jaeger-span-archive": { "aliases": {} },
  "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} }
}

To link the new index (jaeger-archive-jaeger-span-read) with the old index (jaeger-archive-jaeger-span-archive), run the following:

curl -X POST "http://localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    {
      "add": {
        "index": "jaeger-archive-jaeger-span-archive",
        "alias": "jaeger-archive-jaeger-span-read"
      }
    }
  ]
}'

Confirm that the alias has been added:

curl -X GET "http://localhost:9200/_aliases?pretty"
{
  "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-archive-jaeger-span-archive": { "aliases": { "jaeger-archive-jaeger-span-read": {} } },
  "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-main-2-jaeger-service-2025-01-16": { "aliases": {} },
  "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} }
}

Note that if you already had use_aliases set to true for your archive storage. Then for the example above, you would create an alias from jaeger-archive-jaeger-span-archive-read to jaeger-archive-jaeger-span-read

10. Restart Jaeger and Retry the Query

Finally, restart Jaeger and run the same trace queries again:

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}'
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "/api/services"
}
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "GetService"
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}'
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "/api/services"
}
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "GetService"
}

Signed-off-by: Mahad Zaryab <[email protected]>
@mahadzaryab1 mahadzaryab1 added changelog:breaking-change Change that is breaking public APIs or established behavior and removed changelog:bugfix-or-minor-feature labels Jan 16, 2025
mahadzaryab1 added a commit that referenced this pull request Jan 17, 2025
…6555)

## Which problem is this PR solving?
- Towards #6065 

## Description of the changes
- This PR implements a reverse adapter (SpanWriter) that wraps a native
v2 storage writer (tracestore.Writer) and downgrades it to implement the
v1 writer (spanstore.Writer).
- This will be used by #6519
to downgrade a native v2 writer so that it can be used by the v1 query
service

## How was this change tested?
- Added unit tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `npm run lint` and `npm run test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
@mahadzaryab1 mahadzaryab1 changed the title [WIP][v2][query] Create archive reader/writer using regular factory methods [v2][query] Create archive reader/writer using regular factory methods Jan 17, 2025
Signed-off-by: Mahad Zaryab <[email protected]>
@mahadzaryab1 mahadzaryab1 marked this pull request as ready for review January 17, 2025 01:10
@mahadzaryab1 mahadzaryab1 requested a review from a team as a code owner January 17, 2025 01:10
Signed-off-by: Mahad Zaryab <[email protected]>
@@ -140,26 +142,58 @@ func (s *server) addArchiveStorage(
return nil
}

f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this method anywhere? If yes, can that code be upgraded as well? If no, I would remove it.

reader tracestore.Reader,
writer tracestore.Writer,
) (spanstore.Reader, spanstore.Writer) {
v1Reader, err := v1adapter.GetV1Reader(reader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we shouldn't be returning error from this function - it's not an error for reader not to be an adapter. It should return (Reader, bool)

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the test report into the description, so that it's more discoverable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage changelog:breaking-change Change that is breaking public APIs or established behavior v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants